{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright 2019 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "# https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# PyTorch Training\n",
    "This notebook trains a model to predict whether the given sonar signals are bouncing off a metal cylinder or off a cylindrical rock from [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Sonar%2C+Mines+vs.+Rocks%29).\n",
    "\n",
    "## The data\n",
    "The Sonar Signals dataset that this sample uses for training is provided by the UC Irvine Machine Learning Repository. Google has hosted the data on a public GCS bucket `gs://cloud-samples-data/ml-engine/sonar/sonar.all-data`.\n",
    "\n",
    "* `sonar.all-data` is split for both training and evaluation\n",
    "\n",
    "Note: Your typical development process with your own data would require you to upload your data to GCS so that you can access that data from inside your notebook. However, in this case, Google has put the data on GCS to avoid the steps of having you download the data from UC Irvine and then upload the data to GCS.\n",
    "\n",
    "### Disclaimer\n",
    "This dataset is provided by a third party. Google provides no representation, warranty, or other guarantees about the validity or any other aspects of this dataset.\n",
    "\n",
    "# Build your model\n",
    "First, you'll create the model (provided below). This is similar to your normal process for creating a PyTorch model. However, there is one key difference:\n",
    "\n",
    "Downloading the data from GCS at the start of your file, so that you can access the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud import storage\n",
    "import pandas as pd\n",
    "import torch\n",
    "import torch.optim as optim\n",
    "import torch.nn as nn\n",
    "from torch.utils.data import Dataset\n",
    "from torch.utils.data import DataLoader\n",
    "from torch.utils.data import random_split"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add code to download the data from GCS (in this case, using the publicly hosted data). You will then be able to use the data when training your model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Public bucket holding the census data\n",
    "bucket = storage.Client().bucket('cloud-samples-data')\n",
    "\n",
    "# Path to the data inside the public bucket\n",
    "blob = bucket.blob('ml-engine/sonar/sonar.all-data')\n",
    "# Download the data\n",
    "blob.download_to_filename('sonar.all-data')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Read in the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the dataset to be used by PyTorch\n",
    "class SonarDataset(Dataset):\n",
    "    def __init__(self, csv_file):\n",
    "        self.dataframe = pd.read_csv(csv_file, header=None)\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.dataframe)\n",
    "\n",
    "    def __getitem__(self, idx):\n",
    "        # When iterating through the dataset get the features and targets\n",
    "        features = self.dataframe.iloc[idx, :-1].values.astype(dtype='float64')\n",
    "\n",
    "        # Convert the targets to binary values:\n",
    "        # R = rock --> 0\n",
    "        # M = mine --> 1\n",
    "        target = self.dataframe.iloc[idx, -1:].values\n",
    "        if target[0] == 'R':\n",
    "            target[0] = 0\n",
    "        elif target[0] == 'M':\n",
    "            target[0] = 1\n",
    "        target = target.astype(dtype='float64')\n",
    "\n",
    "        # Load the data as a tensor\n",
    "        data = {'features': torch.from_numpy(features),\n",
    "                'target': target}\n",
    "        return data\n",
    "\n",
    "\n",
    "# Load the data\n",
    "sonar_dataset = SonarDataset('./sonar.all-data')\n",
    "# Create indices for the split\n",
    "dataset_size = len(sonar_dataset)\n",
    "test_size = int(0.2 * dataset_size)  # Use a test_split of 0.2\n",
    "train_size = dataset_size - test_size\n",
    "\n",
    "# Split the dataset\n",
    "train_dataset, test_dataset = random_split(sonar_dataset,\n",
    "                                           [train_size, test_size])\n",
    "# Create our Dataloaders for training and test data\n",
    "train_loader = DataLoader(\n",
    "    train_dataset.dataset,\n",
    "    batch_size=4,\n",
    "    shuffle=True)\n",
    "test_loader = DataLoader(\n",
    "    test_dataset.dataset,\n",
    "    batch_size=4,\n",
    "    shuffle=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is where your model code would go. Below is an example model using the census dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.manual_seed(42)\n",
    "\n",
    "# Create the Deep Neural Network\n",
    "class SonarDNN(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(SonarDNN, self).__init__()\n",
    "\n",
    "        self.net = nn.Sequential(\n",
    "            nn.Linear(60, 60),\n",
    "            nn.ReLU(),\n",
    "            nn.Dropout(p=0.2),\n",
    "            nn.Linear(60, 30),\n",
    "            nn.ReLU(),\n",
    "            nn.Dropout(p=0.2),\n",
    "            nn.Linear(30, 1),\n",
    "            nn.Sigmoid()\n",
    "        )\n",
    "\n",
    "    def forward(self, x):\n",
    "        return self.net(x)\n",
    "\n",
    "# Create the model\n",
    "net = SonarDNN().double()\n",
    "optimizer = optim.SGD(net.parameters(),\n",
    "                      lr=0.01,\n",
    "                      momentum=0.5,\n",
    "                      nesterov=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define the training loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train(net, train_loader, optimizer, epoch):\n",
    "    \"\"\"Create the training loop\"\"\"\n",
    "    net.train()\n",
    "    criterion = nn.BCELoss()\n",
    "    running_loss = 0.0\n",
    "\n",
    "    for batch_index, data in enumerate(train_loader):\n",
    "        features = data['features']\n",
    "        target = data['target']\n",
    "\n",
    "        # zero the parameter gradients\n",
    "        optimizer.zero_grad()\n",
    "        # forward + backward + optimize\n",
    "        outputs = net(features)\n",
    "        loss = criterion(outputs, target)\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "        # print statistics\n",
    "        running_loss += loss.item()\n",
    "        if batch_index % 6 == 5:  # print every 6 mini-batches\n",
    "            print('[%d, %5d] loss: %.3f' %\n",
    "                  (epoch, batch_index + 1, running_loss / 6))\n",
    "            running_loss = 0.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define the testing loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test(net, test_loader):\n",
    "    \"\"\"Test the DNN\"\"\"\n",
    "    isp = False\n",
    "    net.eval()\n",
    "    criterion = nn.BCELoss()  # https://pytorch.org/docs/stable/nn.html#bceloss\n",
    "    test_loss = 0\n",
    "    correct = 0\n",
    "\n",
    "    with torch.no_grad():\n",
    "        for i, data in enumerate(test_loader, 0):\n",
    "            features = data['features']\n",
    "            target = data['target']\n",
    "            if not isp:\n",
    "                isp = True\n",
    "                print(features)\n",
    "                print(target)\n",
    "            output = net(features)\n",
    "            # Binarize the output\n",
    "            pred = output.apply_(lambda x: 0.0 if x < 0.5 else 1.0)\n",
    "            test_loss += criterion(output, target)  # sum up batch loss\n",
    "            correct += pred.eq(target.view_as(pred)).sum().item()\n",
    "\n",
    "    test_loss /= len(test_loader.dataset)\n",
    "    print('\\nTest set:\\n\\tAverage loss: {:.4f}'.format(test_loss))\n",
    "    print('\\tAccuracy: {}/{} ({:.0f}%)\\n'.format(\n",
    "            correct,\n",
    "            (len(test_loader) * test_loader.batch_size),\n",
    "            100. * correct / (len(test_loader) * test_loader.batch_size)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train / Test the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "epochs = 10\n",
    "for epoch in range(1, epochs + 1):\n",
    "    train(net, train_loader, optimizer, epoch)\n",
    "    test(net, test_loader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Export the trained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.save(net.state_dict(), 'model.pth')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "! ls -al model.pth"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run a simple prediction with set values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rock_feature = torch.tensor([[3.6800e-02, 4.0300e-02, 3.1700e-02, 2.9300e-02, 8.2000e-02, 1.3420e-01,\n",
    "         1.1610e-01, 6.6300e-02, 1.5500e-02, 5.0600e-02, 9.0600e-02, 2.5450e-01,\n",
    "         1.4640e-01, 1.2720e-01, 1.2230e-01, 1.6690e-01, 1.4240e-01, 1.2850e-01,\n",
    "         1.8570e-01, 1.1360e-01, 2.0690e-01, 2.1900e-02, 2.4000e-01, 2.5470e-01,\n",
    "         2.4000e-02, 1.9230e-01, 4.7530e-01, 7.0030e-01, 6.8250e-01, 6.4430e-01,\n",
    "         7.0630e-01, 5.3730e-01, 6.6010e-01, 8.7080e-01, 9.5180e-01, 9.6050e-01,\n",
    "         7.7120e-01, 6.7720e-01, 6.4310e-01, 6.7200e-01, 6.0350e-01, 5.1550e-01,\n",
    "         3.8020e-01, 2.2780e-01, 1.5220e-01, 8.0100e-02, 8.0400e-02, 7.5200e-02,\n",
    "         5.6600e-02, 1.7500e-02, 5.8000e-03, 9.1000e-03, 1.6000e-02, 1.6000e-02,\n",
    "         8.1000e-03, 7.0000e-03, 1.3500e-02, 6.7000e-03, 7.8000e-03, 6.8000e-03]],  dtype=torch.float64)\n",
    "rock_prediction = net(rock_feature)\n",
    "\n",
    "mine_feature = torch.tensor([[5.9900e-02, 4.7400e-02, 4.9800e-02, 3.8700e-02, 1.0260e-01, 7.7300e-02,\n",
    "         8.5300e-02, 4.4700e-02, 1.0940e-01, 3.5100e-02, 1.5820e-01, 2.0230e-01,\n",
    "         2.2680e-01, 2.8290e-01, 3.8190e-01, 4.6650e-01, 6.6870e-01, 8.6470e-01,\n",
    "         9.3610e-01, 9.3670e-01, 9.1440e-01, 9.1620e-01, 9.3110e-01, 8.6040e-01,\n",
    "         7.3270e-01, 5.7630e-01, 4.1620e-01, 4.1130e-01, 4.1460e-01, 3.1490e-01,\n",
    "         2.9360e-01, 3.1690e-01, 3.1490e-01, 4.1320e-01, 3.9940e-01, 4.1950e-01,\n",
    "         4.5320e-01, 4.4190e-01, 4.7370e-01, 3.4310e-01, 3.1940e-01, 3.3700e-01,\n",
    "         2.4930e-01, 2.6500e-01, 1.7480e-01, 9.3200e-02, 5.3000e-02, 8.1000e-03,\n",
    "         3.4200e-02, 1.3700e-02, 2.8000e-03, 1.3000e-03, 5.0000e-04, 2.2700e-02,\n",
    "         2.0900e-02, 8.1000e-03, 1.1700e-02, 1.1400e-02, 1.1200e-02, 1.0000e-02]],  dtype=torch.float64)\n",
    "mine_prediction = net(mine_feature)\n",
    "\n",
    "# Note: Try increasing the number of epochs above to see more accurate results. \n",
    "print('Result Values: (Rock: 0) - (Mine: 1)')\n",
    "print('Rock Prediction:\\n\\t{} - {}'.format('Rock' if rock_prediction <= 0.5 else 'Mine', rock_prediction.item()))\n",
    "print('Mine Prediction:\\n\\t{} - {}'.format('Rock' if mine_prediction <= 0.5 else 'Mine', mine_prediction.item()))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
